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© Process for determination of a complete or a partial contents of very short sequences In the 
samples of nucleic acids connected to the discrete particles of microscopic size by hybridization 
with oligonucleotide probes. 



© Determination of the formula of genomic DNA, 
i.e. genome sequencing, by a hybridization with 
oligonucleotide probes (YU Patent Application 
570/87) envisages the use of 100000 oligonucleotide 
proves and the same number of hybridizations with 
6000000 of addressed sample-clones on filters in 
order to determine contents of oligonucleotide se- 
quences in each clone. The process presents im- 
provements in preparation of samples for hybridiza- 
Jj|tion and improvements which enable one to follow 
^gene exprossion by determining partial or complete 
CD fragment sequences of genomic DNA, mRNA or 
^cDNA. By binding fragments of genomic DNA to 
disci ate particles (DP) of a microscopic size which 
Nare recognizable in a step of reading experimental 
®? image, the necessity Hr addressed samples on fil- 
ters is dispensed with and this drastically reduces 
© automatics l-robotical component of the process and 
allows miniaturization of the entire method from a 
LU level of industrial installation to the level of laboratory 
instrument. Processes for binding DNA fragments to 
DPs recognizable in common reactions, allow elimi- 



nation of cloning, i.e. DNA amplification in the host 
cells and, in a process of library forming, need for 
formation of 6000000 addressed samples in any one 
of the phases of a process for sequencing by hy- 
bridization. 
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PROCESS FOR DETERMINATION OF A COMPLETE OR A PARTIAL CONTENTS OF VERY SHORT SE 
OUENCES IN THE SAMPLES OF NUCLEIC AO08 CONNECTED TO THE DISCRETE PARTICLES OF 
MICROSCOPIC SIZE BY HYBRIDIZATION WITH OLIGONUCLEOTIDE PROBES 



a) Field of the Invention 

The present invention belongs to the field of 
molecular biology. 

b) TAchnica! Problem 

The genome sire varies from 4 x 10* 
nucleotides in bacteria (E. coH) to 3 x 10* 
nucleotides in mammals, including men. The deter- 
mination" of the formula, i.e. the order of 
nucleotides in genomic DMA, is the first-class tech- 
nological challenge for the science of the end of 
the 20th century, H is believed that the availability 
of nucleotide sequence In genome would cause a 
qualitative rise in medicine, biotechnology and fun- 
damental biology itself, and would beneficfently 
influence all fields of these sciences. As a contrast 
to a known method which is not effective enough 
for accomplishing this task, we claimed a method 
tor sequencing by hydridizatlon (Yugoslavian Pat- 
ent Application 570/87 and Amendment No. 4521, 
16.03.1988) which is adapted to a complexity of the 
problem. However, this method requests Industrial 
plant and huge investments. The present solution 
relates to the technological improvements of a ba- 
sic principle claimed which permit miniaturization, 
decrease of the investment costs and wider ap- 
plication of this method and all other applications of 
oligonucleotide hybridization for determination of 
genomic sequences. 



c) State of the Art 

The knowledge about parts or entire genomes 
on the level of primary structure as well as the 
possibility of following the Inheritance using this 
information are being increasingly recognized as 
conditions for a more efficient and faster study of 
biological processes. A part of experimental inves- 
tigations will be replaced by computer research on 
sequences obtained. It might turn out that some 
biological phenomena (evolutionary processes) will 
be accessible for study only though the analysis of 
genome sequences. 

Two areas are recognizable in which there is 
an Increase of organized effort to find methodolog- 
ical solutions which would allow the determination 
of primary genetical Information. One is concerned 
with the detection of a large number of mapped 
polymorphic sites in genomic DNA of individual. 



family or population. In fact, this project means 
determiottion of a defined part of the genomic 
sequence which represents a specific genomic 
sub-ontlty for which we propose the name GENOG- 
s RAM. The other project deals with the determina- 
tion of the entire sequence of human and othur 
gc x>mes. In extreme, it might mean the determina- 
tion of the sequences of genomes of most species 
of interest and in sufficient number of individuals in 
w each species. The first project has two basic ap- 
proaches: the detection of polymor phic sequence 
through the specificities RE have (RFLP) or through 
a specificity which has hybridization with ONP. The 
second method has advantages in following a large 
rs number of polymorphic sites, since it does not 
require the determination of DNA fragment length 
and is easily carried out on an amplified target 
(PCR) or amplified ligatlon-hybridizatlon reaction. In 
order to obtain individual genetic maps with a 1 cm 
20 resolution it is necessary to follow 5,000 to 10,000 
polymorphic sites. 

The second project is envisioned as a mul- 
tiphase physical mapping with a final goal of deter- 
mination of nucleotide sequence. Physical mapping 
*6 also makes use of sequence recognition by RE and 
measurement cf DNA fragments lengths or by the 
determination of contents of ONS by hybridization 
with ONP, and sequencing itself, according to ex- 
isting methods, makes use of measurement of 
30 lengths of fragments and thus indirectly, but in the 
experiment, determines sequence. Two further ap- 
proaches for determining experimentally the order 
of nucleotides are being considered as well. One la 
indirect and is based on the sequential removal of 
35 single nucleotides from one end of DNA fragment. 
The second one supposes direct experimental 
reading of the sequence by means of specific 
electron microscope. Finally, a theory of an ap- 
proach has been developed In which the sequence 
40 is not arrived at directly by the experimental detei- 
mination of the order of nucleotides, but the con- 
tents of ONS is found instead and then these data 
are transformed into sequence information by com- 
putaion work (SBH). Presently, the only realistic 
46 way for determining contents of ONS is hybridiza- 
tion by ONP of the same kind used in the above 
described methods. 

In both projects and in all methods (except 
eventually in microscopy), one has to operate with 
so a huge number of samples which, depending on 
the size and number of genomes, are to be pro- 
cessed amounts to between 10* and 10 7 . Since 
each sample is subjected to one or more identical 
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or similar reactions, there is a problem of ways of. 
and speed of performing of the great number of 
repetitive operations as well as a problem of ways 
and speed of gathering experimental information 
and storing it In computer memory. Obviously the 
solution for the first problem is a robotized process, 
and the most efficient way for gathering data is 
image analysts of experimental image. Speed of 
image analysts amounts already to one million to 
10 millions pixels per second and this allows dif- 
ferentiating of a 10 times smaller number of point 
shaped objects. 

Here we define "Informational Approach" for 
study of genome primary structure, analyze its 
characteristics and informational-technical require- 
ments and give conceptual solutions for Its major 
technological components which can substitute a 
minlaturf *d process for a massive robotized pro- 
cess using addressed samples. Basically, the idea 
is to use the mixture of samples in a form of 
discrete recognizable particles. 



THEORETICAL ANALYSIS 



Informational Approach on its Characteristics 

A common characteristic of three methods 
which, for achieving different goals, make use of 
"mismatch free" hybridization by ONP is the ex- 
perimental determination of the contents of a sin- 
gle, some or almost all ONS in specific fragments 
of genomic DNA. These are the method of detec- 
tion of polymorphic sites, method of forming or- 
ganized genome library (link-up) and SBH method. 
For the same targets there are methods which are, 
in an informational sense, based on the experimen- 
tal determination of the position, i.e. the sequence 
of individual nucleotides or definite ONP. These 
methods are RFLP analysis, restrictional mapping 
and forming of organized libraries on the basis of 
restrictions; pattern and sequencing by means of 
acrylamide gel. We define methods which make 
use of determination of the contents of ONS as 
"Informational Approach". In this approach the 
same principle is used for both the determination 
of a total sequence and of selected parts of 
genome (GENOGRAM) and so, in general, this 
approach can be defined as "Informational ap- 
proach for a study and following of primary struc- 
ture of genome". It has several essential char- 
acteristics. 

The most essential feature of this approach is 
the possibility of acquiring experimental data 
(contents of ONS) in unpositioned samples. Since 
the sequence of elements is not the object of the 
measurements, nor is the physical distance of ele- 



ments, there is no requirement for ordered spatial 
i -.position of samples, nor should they have a 
defined starting position, etc. 

The second characteristic is that contents of 

6 ONS can in principle be determined in a sample 
which in final experimental treatment occupy a 
micro volume or an area of optimal size. Since no 
physical separation is requested, such transport 
effects as diffusion, etc. are avoided. These effects 

10 usually impose the necessity for macro volume. 

The third characteristic (which is a conse- 
quence of the proceeding ones) is a considerable 
density of data bits per unit of experimental volume 
or area. It can be defined as the possibility of 

75 achieving high degree of parallel acquisition of 
data. 

The forth positive characteristic in this ap- 
proach is the fact that data fraction, such as sam- 
ple position, order of elements in the sequence, 

20 starting and ending ONA fragments or order oper- 
atic 3, does not have to be determined exprrimen- 
tally, but instead, it can be replaced with informa- 
tional computer processing of data which can be 
gained experimentally in a simpler and faster way. 

25 The fifth feature is the increase in the number 
of experimental data bits (size of the matrix) at the 
expense of reducing amount of work necessary for 
the acquisition of more sophisticated and smaller 
matrix. This means that the burden of the process 

30 is placed on the data reading step or, in other 
words, on the input of more daU. bits in computer 
memory. 

These characteristics allow the formulation of a 
concept of a miniaturized, fast and frugal process 

35 for generation of necessary experimental £ata. 

Scientific and practical needs for determination of 
GENOGRAM and entire genomic sequences have 
the same Informational-technological requirements 
We shall define data BITS es a discrete experi- 

40 mentally gained meaningful .datum. Thus one data 
bit is a fact that the millionth nucleotide is the 5th 
human chromosome A. Or that X fragment of hu- 
man genomic ONA contains Y ONS. A minimal 
number of data bits for determination of entire 

45 sequence of one mammalian genome is 3 billions 
or it is equal with the number of bp. Since the 
uftimato goal is to have all data bits stored in 
computer memory, for determination of a process 
speed it is important to know part (%) of the 

so necessary number of data bits which can be ex- 
perimentally collected and stored in a mer..ory par 
unit of time. 

One can assume that in future, the time for 
determining the sequence of a complex genome 
55 preferably cannot be longer than a year. If a sec- 
ond is taken as a unit cf time (one year has 3.1536 
x 10 7 seconds), then we may consider as accept- 
able to acquire all data bits in 10* to 10 7 seconds. 
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If one chooses 10* seconds (12 days without in- 
terruption) as a very favourable time span, then 
one should obtain one part in a million of neces- 
sary data every second. It should be taken into 
consideration that 12 days defined in previous ana- 
lysis, represent in realitiy at least a 5 to 10 times 
longer period because considerable part of this 
time is not used for effecting sequencing oper- 
ations but is used instead for sample preparation, 
controls, etc. 

How many data bits aro necessary for the 
determination of either GENOGRAM or a sequence 
of one mammalian genome by determination of the 
contents of ONS? For sequence determination 
about 10 7 separate genomic fragments (samples) 
and about 10 s ONP are needed and these figures 
request 10 12 data bits. For GENOGRAM at least 
10* genomic sites or fragments are necessary for 1 
'cm map. It can be argued that 10 s dots in a 
genome have to be investigated because this fig- 
ure is an approximate number of genes. In that 
case GENOGRAM would need a comparatively 
great number of fragments in which the contents of 
one or of most of ONS would have to be deter- 
mined. In a such extensive GENOGHAM. in spite 
of a multiple use of the same ONS, the number of 
necessary ONS reaches the order of several tens 
of thousands. In practice, it is more suitable to 
have each ONP with a length of 7 or 8 bases and 
thus, with 16,000 or 65.000 ONP, to possess a 
rem for detection of the change in any se- 
ntence. It is considered that the most efficient way 
is not to determine specific pairs of protrs and 
targets but instead to try to get more information 
by a simple combination of each probe with each 
target. Based on the analysis of this sort, it can be 
said that for GENOGRAM about 10 s data bits are 
necessary. We can consider as reasonable to col* 
lect on 10* seconds (12 days) information for 100 
to 1.J00 GENOGRAMS. Roughly, it is the analysis 
of several patient per day or testing of a sample 
of 1,000 Individuals of certain population within a 
pe*:od of two months. With such an approach as 
regards the requirements of molecular genetics or 
the future, we reach again the same number of 
data bits as in sequencing of a single genome 
(about 10 11 to 10 12 ). Therefore, extensive popula- 
tion and diagnostic investigations reach the amount 
of work necessary for sequencing of the entire 
genomes. 

Since we established the fact that a millionth 
part of all data bit* has to enter memory in one 
second, we arrive at a parameter of the crucial 
Importance for this approach - how to produce and 
store a million data bits per second. 

These huge technological requirements for 
picking up a knowledge of GENOME structure can 
be accomplished by developing a system for quick 



gaining of necessary data bits. In the next chapters 
one such system which makes use of tho char- 
acteristics of informational approach is described. 



Phases of informational approach 

10 

One can define four phases in experimental 
part of this approach, when the content of ONS is 
determined by ONP hybridization. After storing in- 
formation related to the contents of ONS in specific 

is fragments of genomic DNA, information phase 
takes place in which the entire sequence is gen- 
erated or paris thereof. These phases are: 

1. Sample preparation (isolating, marking, 
preparation for 3.) 

20 2. Probe preparation (syntnesis. ONP bank, 

probe labeling) 

3. Hybridization 

4. Reading and storing data bits. 

Three goals are defined which should be 
25 achieved either in each phase separately or in the 
process as a whole; samples and probes should be 
to the least possible extent recognizable by posi- 
tion (coordinates, addresses) and to the maximum 
extent by their innate informational characteristics; 
30 there should be the least possible number of in- 
dependent hybridization reactions and ability for 
"reading" one million +/- ONPxTARGET data bits 
per second 

Samples as a system of discrete and recognizable 
particles 

The samples can be considered in two phases: 

40 sample preparation and hybridization. Obvious so- 
lutions are preparation of addressed samples in 
microtiter plates and addressed hybridization dots 
in ordered dot-blots. One can pose the question 
can the use of addressed samples be avoided. 

45 This is especially important in case when the sam- 
ples need not to be kept for permanent storage, or 
can easily be be obtained when needed, so it is 
easier not to keep them. The task is particularly 
difficult when the detection system requires more 

so than one target molscule. i.e. amplification of 
genomic fragments. On the other side, one can not 
determine (or is convenient not to determine in 
order to retain parallelism) the complete cortents 
of ONS (i.e. to hybridize all ONP) on a sample 

55 present only once in the hybridization area. Hence, 
it is useful to have a given genomic fragment in the 
large number of copies in one hybridization spot 
and many such hybridization spots in separate 
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hybftdi zation areas. One would like to recognize 
such hybridization spots, even if they are not sepa- 
rated as amplified genomic fragments in tuoes with 
known marks and ordered hybridization dots with 
known coordinates. 5 

As in many other methods and techniques for 
chemical synthesis of DNA, the answer is the use 
of solid support as the substitute for reaction tubes 
to keep the liquid samples apart. A drop of water 
containing the necessary number of copies of DNA to 
fragment given is replaced by the solid particle 
(further on in this text discrete particle) which is 
carrying the required number of copies of the same 
DNA fragment attached to its surface. These par- 
ticles can be looked upon as small beads of defi- '5 
nite size and shape, similar to the ones already in 
use in different applications. In order to simplify 
description, the use of discrete particles in hy- 
bridization reaction will be presented first starting 
from physically separated and amplified samples 20 
obtained in a p reparatory phase. 

So, there is a library of genomic clones placed 
on microtiter plates. One adds discrete particles to 
each well and binding reaction takes place result- 
ing in DNA being attached to them. Thus each 2s 
liquid sample is divider 1 into a certain number of 
DPs. Aliquotes of DPs from each well are mixed 
together and spread in the monolayer of required 
density. This is followed by the necessary number 
of fixations. In this way one can obtain hybridization jo 
areas (HA) simitar to filters in dot blot procedure. 
Every DP represents one dut, and solid support a 
certain area of the filter. One can imagine a simple 
case in which each HA contains enough randomly 
displayed DPs that each clone from the library is 35 
represented at least once. The other HAs are repli- 
cas but DPs are present in different places. Every 
HA can in principle be hybridized and reused as 
classic fil tors. The main problem is to recognize 
DP with the some clone in different HA. The latter 40 
is probably very difficult to achieve with 10 s ONP 
required for SBH. Also, the possibility of perform- 
ing many hybridization reactions in parallel is lost, 
if only one HA is used. 

We see three principal ways of recognizing 45 
DPs which can oe combined. L^r 9 ** 

1) Labeling with physical attributes oi DP like 
size, shape and *>r which can be differentiated in 
a phase of reuding, for instance, during image 
analysis. 5j 

2) Labeling with different combinations of 
ONs which can be recognized as such by hy- 
bridization with appropriate ONPs. Thus, out o' 20 
different ONs 3x10 s different combinations can be 
formed with 10 ONs each. By attaching oach com- 55 
bination of ON to DP 3 x10 s differently labeled DPs 

are obtained. They can be recognized by detection 
of the combination given in hybridization with 20 



ONPs which are complementary to ON In question. 

3) Labeling (better recognition) by the use of 
certain fraction of ONPs on all HA.. This principle is 
the one used in a link-up method. The requirement 
is that HAs can stand a testing with great number 
of ONPs in such a way that one part can be used 
for recognition. Lehrach found that 100 ONPs hav- 
ing the same density as 8- or 9-mer probes lor 
SBH are suitable for recognising overlapped cos- 
mids in entire genomic library. In this case recogni- 
tion of identical clones, not the overlapping ones, is 
required in a mixture of defined number of different 
clones, tt the mixture is simplier, all clones need 
not be mixed at once, but separate mixes with 
smaller number of clones can be prepared and 
used to obtain subdivided HAs thus reducing the 
number of probes required. 

With the combination of all three principles and 
the use of subdivided HAs one can recognize tO 7 
separate samples required by GENOQRAM and 
SBH. It Is obvious that the use of HA dividod into 
10 parts and 100 "labels'* per each principle allows 
the discrimination of this number of separate sam- 
ples. In this combination scheme, one has to pre- 
pare 10000 samples with differently labeled DPs 
using the principles 1. and 2. However, it is obvious 
that maximal use of the third principle decreases 
the need to prepare differently marked DPs, but 
increases the number of required hybdridizations. 



Oligonucleotide probes bound to DP : Inverse DP- 
SBH 

Two ways of sequence determination by the 
hybridization have been proposed in which instead 
of clones, ONPs are bound to a support in form of 
dots. A similar system for following "expression 
patterns" is being developed by Southern (private 
communication). The pioblem of these methods is 
that for ONPs of suggested informational length of : 
8 bases in two cases, and 4 caser in the third one., 
either only a very specific information of the com- 
plex sample (8-mer located by poly A. Southern) 
can be determined or the sample used for hy- 
bridization must be a very short DNA, shor ter than 
170 bp I in the one case and snorter than \2 O0 bp in^ 
the other case. These a^prjaches are impractical 
for sequence determination of tho complex 



genomes because more than 10 millions of hy- 
bridizations are needed. On the other hand, a sepa- 
rate synthesis of longer probes and their deposition 
onto a place in HA with predeturn^ed coordinates 
is probably limited to 8-10 mer lengths. Ho *e\ v, 
by making use of recogni ab'a DPs to whief* o.tgo 
probes are bound, it is poi ible to ( ercome the* a 
problems and such an inv-.se pr^edure (ISF I) 
has the potential allowing the application to 
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genome sequencing. The base of ISBH form DPS 
tfP* / carrying specific combination of oligonucleotide tar- 
* *L ( gets for recognition according to principle 2, and 
' specific functional oligonucleotides of a length se- 
lected as in a case of generating clones by a 
selection (see part Direct preparation of the frag- 
ments of genomic DNA on DP). By a combined 
synthesis (see part on Ottgo bank) ail different 
ONPs of the particular length can be synthesized in 
a comparatively small number of reactions in such 
a manner that each ONP Is on DP with a specific 
known combination of oUgo targets. DPs pnpmrmd 
in this way are used for forming monolayer HA 
(OHA olKPhybridizstton area) as it would be the 
case if DNA fragments of the certain clone were 
linked to them. In order that most of ONPs in OHA 
is represented with at least one DP. a certain 
amount of redundancy is necessary (ca. 10 times). 
By the hybridization of OHAs with the ONPs which 
are complementary to oligotargets from which 
combiantions for marking DPs are prepared (and 
for the most complex OHAs less than 50 probes 
«td hybridizations are necessary, because SO o"go 
targets are sufficient for preparing more 10" com- 
binations of 25 targets in a small number of reac- 
tions), exact position of each ONP in each OHA is 
established. OHAs prepared in this way with in- 
formation on the position of each ONP is a 
"product* which can be used for very fast and 
simple sequence determination of the very long 
DNA fragments. A maximum length (DNA complex- 
ity) (L) depends on a length of ONPs (N) which 
have been synthetized on DP in a given HA. In 
order to obtain average sequence of the fragment 
given in a form of 10 cosmids (subfragments, SF in 
nomenclature of 8BH) L is not to be considerably 
greater than (4 M )/100. For cosmid sequencing ti- 
mers are necessary, for VAC Interest sequencing 
13-mers. In a former case 4 million differently 
marked DPs are needed, in a later one 65 million. 
This is within a number of clones necessary for 
sequencing mammalian genomes by a direct SBH 
with 8-mer probes. If YAC clones were used in 
ASBH. then for mammalian genome about 1-5 
thousand groups with 10 differently marked clones 
in each and the same number of hybridizations 
would be needed, while in the usual SBH 10000 
hybridizations with groups consisting of 10 dif- 
ferently marked 8-rt*:rs. ll»e number of syntheses 
and operations is in this case tho same- to: both 
methods. 

A process for sequence determination using 
OH A would comprise several simple steps: 

1) random fragrtv Mation of the sufficien 
mass of a given fragr^, t of genome DNA to the 
lengths slightly bigger than C* Vs linked to 

2) marking of f . trap nents gener -..J; one 
possibility is the use of termii.;' transit a , and 



one fluorescemiy marked nucleotide triphosphate; 

3) discriminative hybridization of the marked 
genome fragments and OHA; 

4) "Image analysis" of microscopic OHA im- 
s age. In these four steps sequence of ONPs in the 

given long DNA fragment would have been deter- 
mined. By the computer processing of data accord- 
ing to algorithm and rjrogrammes for generating 
SF. either a continuous sequence of the fragment 

ro given or the sequence in a form of the limited 
number of 8Fs would have been regenerated. SBH 
can be successfully applied for determination of 
one very important part of genome Information, and 
these are the sites with rxxymorphic sequence. 

is Everything that one needs is a sufficient number of 
functional nucleotides which in most cases have 
only one target wtth complementary sequence in 
the given sample. For mammalian genome, for this 
purpose the most suitable are 17-mers. On the 

to average, each tenth 17-mer should have a com- 
plementary sequence in given mammalian 
genome. With OHA containing 10* 17-mers (less 
than 1/100 of all 17-mers) about 10 7 17-mers would 
be detected as positive. Since with 17-mers 17 bp 

*s can be read. OHA of this sort would allow 
"reading" of at least 10 s bp. Since it is believed 
that in each group of 1000 bp exists one polymor- 
phic bp. such a OHA would attow following about 
10 s of poryrnorphic sites. By analysing individuals 

so in several generations from several families a very 
dense genetic map (0.1 cM) could be determined 
which would be useful, in a much simpler way than 
RFLP markers, for following in a great number of 
individuals for various Investigations. 

35 IBSH has several significant characteristics: 

1) With a possibility of a great number of 
rehybridizations OHA accepts the properties of 
measuring Instrument, or stated in informational 
jargon of CHIP (alternatively: sequencing card) 

40 which permits minimal sample processing. 

2) A possibility for preparation of OHA of 
different complexity for sequencing fragments of 
different lengths. One can imagine OHA with 
200000 9.mers for sequencing 1-2 Kb fragments. 

45 OHA with 4 million 11-mers for sequencing cos- 
mids inserts of 50 kb. OHA with 65 million 13-mers 
for sequencing YAC inserts and. what is certainly 
most attractive. OHA with from 1 billion of 15-mers 
to 1000 billion of 20-mers for sequencing completej 

so u t » usom es , or genomes, or entire mRN A 
(cDNA) of specific tissue In only one hybridization 
r action. It should be mentioned that no additional 
.ifiiculty is imposed u> the samples consisting of 
several shorter fragments (mRN A of certain tissue). 

5. However, in case of total mRNA (cDNA) of the 

i^pecific tissue, problem can arise from the different 
quantity of each ..i...4».. One possible solution for 
this problem is to use sufficier ^ass of the sam- 
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pie (PGR application) in order to bring the least 
represented mRNAs exists which has nothing to be 
linked to. 

3) There are no reqirements for 10-100 dif- 
ferent markerr which are almost unavoidable in 
usual SBH in order to decrease the number of 
separate hybridizations. If they do exist, different 
markers can be sequenced by a simultaneous hy- 
bridization with sequencing card. 

4) A possibility for highly specific labeling 
(incorporation of 100 marked nucleotides by means 
of terminal transferase) by means of which both the 
requirement for a number of ONP molecule per DP 
and the mass of DMA fragments being sequenced 
are decreased. For 10 s ONP molecules per one 
OP, in case of 15-mers, for 100 OH As with redun- 
dancy of 10 times, it is necessary to perform 3000 
synthesis, each one on the present usual scale for 
the synthesis of oftgonucieotxje in an amount of 1 
mg. If 1000 molecules of ONP per DP are suffi- 
cient then with 400 syntheses on 10 mg scale 100 
OHAs with all 20-mers can be prepared. 

5) A possibility for achieving great accuracy 
in hybridization. In order to avoid forming a great 
number of SFs, it is necessary to have such a ratio 
between L and N that, on the average, only each 
tenth ONP possesses complementary sequence in 
the given fragment of genomic DNA. On the other 
hand that means that chances for a larger number 
of sites with one non-paired nucleotide are small 
which represents the most difficult case for dis- 
crimination. When L/4 M » 1/1000 then 
oligonucleotide probes are approaching dis- 
criminativity possessed by unique genome probes. 

The main uncertainty of ISBH is hybridization 
with every complex probe, especially in case of 
using ONPs longer than 13 bases and genome 
fragments larger than a million bp. The basic prob- 
lem is simultaneous hybridization with ONP having 
two extreme QC contents. Some solutions of this 
problem have been already given, for Instance, 
washing in tetramethyl ammonium chloride. An- 
other problem is of a technical nature and has 
been already mentioned. It is the combination of 
oligonucleotide synthesis and linking of 
oligonucleotides already synthetized to the same 
DPs. Since these two reactions do not necessarily 
take place at the same time, solution of this dif- 
ficulty does not represent huge, non-solvable prac- 
tical problem. On the other hand, highly homolo- 
gous and simultaneously highly repetitive se- 
r~ quences represent significant obstacles for this ap- 
/ proach. In direct SBH with clones this problem has 
-/ been solved by using libraries with clones of dlf- 
ferent size. Because of these sequences (LINE, 
SINE) a much larger number of subfragments (SF) 
will be formed in comparison with a case of 
gencme with random sequence. Solution of this 
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problem is using as big 4 M /L ratio as possible 
and/or using the existing and new information from 
clone systems and otehr methods for comparing 
generated SFs. 



Direct preparation 
DNA on DP 



of the fragments of genomic 



Depending on the fact whether the detection of 
ONS by hybridization can be done on one or a 
large number of molecules of the fragment given 
and on the mode of the fragment amplification, one 
can define three possible ways of marking samples 
as direct mixes of DPs. Such an approach elimi- 
nates the heed for preparing and maintaining of 
macro-separate and addressed samples. 

1) Detection on an individual molecule by using 
DPs labeled according to principles 1 and 2 com- 
bination with recognition according to principle 3. 
The labeled DPs serve for discrimination of the 
parts of genome such as chromosomes, YACs. 
etc., or individuals in parallel preparation of a num- 
ber of GENOGRAMS. The fragments of defied 
part of genomic DNA will be attached as unit 
molecules to specificity marked DPs in separate 
reactions. DPs are mixed afterwards and used to 
form one HA. DPs carrying the same, or mor e 
Qftefwruirima^ fragments will be rec - 
ognized using principle 3J rom the previous sec- 
ttovrThnSonffast to using^ cloned fragments*, the 
fragments are here rarely identical, so that groups 
of densely overlapped fragment are recognized. In 
this situation the complete contents are obtained 
only fur a part of sequence shared by the group of 
fragments given. To use random groups of frag- 
ments obtained by ligation (ordering library in SBH) 
one needs to PCR or clone them without separa- 
tion of clones. The "separation* of necessary frag- 
ments required for GENOQRAM from the rest of 
genomic DNA is best accomplished by a PCR 
reaction. 

2) Amplifying by PCR. PCR can be used for 
preparation of genomic library of fragments with a 
continual length determined by a success of am- 
plification. The length of 5 kb has been dem- 
onstrated. The procedure would require ligation of 
single to ends of genomic fragment mixture, the 
dilution of ligation products to single molecules per 
volume, and then their use in separate PCR reac- 
tions, for examole, in microtiter wells. In this way 
clones of starting fragments could be obtained in 
vitro. 

K is possible to see the implementation of PCR 
without the separation of individual fragments in 
addressed liquid samples. The requirement is that 
micro droplets of an amplification mixture each 
containing either a single fragment or none are 




13 



EP0 382 546 A2 



14 



enclosed into smalt spheres (pearls) formed of ap- 
propriate membranes (perhaps the semipermeable 
ones) together with DP conglomerates. OP con- 
glomerates are composed of DPs of similar char- 
acteristics and should be easily separable Into in- 5 
dividual DP components under mild conditions. 
The use of conglomerates provides the way to 
prepare more DPs with the same DNA fragments 
which are required for multiple HA "replicas". 
Microsphere formation should be considered as a 10 
process for formation of fat droplets, as a statistical 
process with a certain degree of success, rather 
than a highly robotired process with high fidelity. 

Eve -y microsphere represents separate amplifi- 
cation reaction similar to a microtiter well. After the is 
amplification, the reaction of binding of amplified 
fragments to spheres is performed in which suit- 
able reagent for which membrane is permeable is 
used. The disruption of membranes and conglom- 
erates results in a mix of DPs in which each DNA 20 
fragment is represented in a sufficient number of 
copies on an adequate number of DPs. 

3) The separation of groups of densely overlap- 
ping fragments on DPs capable of selecting, in- 
stead of amplification of a single fragment. One can » 
imagine separation by the selection on the basis of 
hybridization. One shot J have 10 millions of DPs 
each carrying specific ON. ONs will have the 
lengths which ensure their occurence mostly once 
in a genomic sequence. The ways of obtaining this 30 
number of DPs will be explained later. Random 
fragments (longer than finally required) obtained 
from a targe mass of genomic DNA are subjected 
to action of 5* or 3' exonucleases. These fragments 
are subsequently randomly cut and again size se- as 
lected. After selective hybridization cloning and 
covalence linking by ligation is performed. In this 
way. each DP will have bound to itself those ONs 
which are internally displaced for the lentgh of 
single-stranded end containing ON given. The rec- 40 
ognition of DPs with the "same** fragments can be 
done by labeling DPs by any one or a comDination 
of principles according to 1 and/or 2. and/or by 
using recognition without labeling DPs according to 
principle 3. This selective procedure is even more 45 
applicable to GENOGRAM where the number of 
samples per individual genome is 100-1000 times 
smaller. DPs would carry ONs of selected se- 
quence complementary to the sequence of frag- 
ments that ought to b<* examined. so 

The procedure 1. is the most simple one in a 
technological sense, but the detection of hybridiza- 
tion on a single molecule is a difficult, still unresol- 
ved problem. The other two procedures presume 
many technically untested operations. On the other 55 
hand, several different, theoretically possible solu- 
tion allow conclusion that preparation of defined 
fragments of genomic DNA, as the separate sam- 



ples, can be achieved in a DP mix. 



ON bank 



Preparation of ONs (ONPs) in a mixture 

The synthesis of large number of separate ONs 
is a considerable task if standard "gene machines" 
are used. However, the synthesis of ONs can be 
appreciably speeded up by using combination prin- 
ciples. This approach ensures a more rational and 
cheaper synthesis of smaller quantities of individual 
ONs. An even higher degree of rationalization can 
be achieved by synthesis of sufficient quantities of 
large number of different ONs having multiple ap- 
plications and which can be used by different lab- 
oratories. This principle has been already used in 
the synthesis of tinkers, adapters and primers. In 
lins way ON bank would be obtained (an initiative 
by Crkvenjakov, Drmanac, Beattie). One can ask 
the question which bank would be the most useful. 
The answer lies in the recognition of ON char- 
acteristics that are the most suitable for major 
areas of their application and these are: detection 
of s^uence by hybridization, change of existing 
DNA sequence and synthesis of DNA fragments 
(amplified fragments, subclones, clones suitable for 
S8H) can be performed even with very short 
ONPs, 8, 7 or even 6 nucleotides in length. ONPs 
about 20 basrs long are suitable for hybridization 
with total genomic DNA. Primers for site specific 
mutagenesis and PCR are usually 15-20 mers. 
Even 8-mers are active primers in PCR. The proce- 
dure for DNA synthesis based on sequential joining 
of short blocks is being developed. We consider 
the bank containing all possible 3-mers to 3-mers 
very useful for the following reasons: (i) mentioned 
areas of application, (ii) technologically acceptable 
number of samples for bank to contain, (Hi) the 
possibility of generating longer of ONs from shorter 
cnes (ligation, the use of dideoxy nucleotides and 
terminal transferase). Their total t lumber is about' 
90.000. According to Beattie's calculation a bank of 
8-mers (65.536 ONs) could be synthetized in less 
than 6 months with total investment of 3 million 
dollars. The cost of materials and labor fcr such a 
bank having a stock of 1-2 mg of each ON is 2 
million dollars. The cost for all ONs in the stock of 
lOug each would amount totally about 10-20 thou- 
sand dollars, and that is some 1000 times less than 
present commercial price. 

The possibility of the usefulness of making ON 
bank on a solid support (perhaps even DP) which 
could be subsequently processed by machines or 
manually, affording specifically modified or longer 
ONs. has been considered too (Beattie). The use of 
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the mixture of differently marked ONPs either in 
SBH or in other methods, secures the possibility of 
OMP synthesis as a mix. instead of forming it by 
mixing. For instance. 64 3-mers are synthesized 
ea.\h being placed on a large amount of OP and 
be.ng marked differently, for example, by char- 
acteristics (fluorescence) of the molecule mediating 
attachment of nucleotide to OP. The mixtures of 
equimolar amounts of 64 3-mers are prepared 
and distributed in 1024 samples. Then, synthesis of 
one 5-mer is continued in each part. In this way, 
with 1088 Synthetic reactions all 8-mers can be 
synthetced in the mixture of 64 each. 

Similar principle could be applied for the syn- 
thesis of 10 ; DPs each carrying a different longer 
ON (fc examples. 16-mer). These are necessary 
for preparation of samples in a mix, according to a 
principle of selective hybridization (see ahove un- 
der 3). For instance, 3 groups with 100 DPs in each 
are used 0' possible 100 DPs have different phys- 
ical characteristics) and in each group 100 different 
ONs are linked to DP. For 16-mers 2 groups are 5- 
mers. 1 6-mers. The same 3 groups of ONs exist 
as free, non-linkea to DP. " .-^parate reactions 
involving successive periiiuia-j coupling of DP 
groups and groups of non-attached ONs. 6 millions 
OF: (a definite number of OP) with aifferent ONs 
f*?.ch having ,6 bases would havo been obtained. 
Ir .his wa,. with 300 different starting ONs and with 
several reactions of permutaions and linking, a nec- 
essary number of various ONs in a mixture (in this 
cas^ DP mature) can be prepared. 

SiMtiat combinative synthesis can be applied 
for obtaining the bank of DPs recognizable on the 
base of oligotarget combinations, without or with 
functional oligonucleotide which play a rol* for 
selecting fragment witi-i a corr •.■»•.■ .ternary sequence 
exists in the given sampic of nucleic acids. Marking 
with this combination will be explained with the 
example of using a group of 36 different oliyotar- 
gets and preparation of the combination with 18 
different targets. A maximum number o? these 
com^-nations if they were formed in separate reac- 
tions woi t.»o bi .'on. However, with a compara- 
tively small number cf separate reaction through a 
successive linking of the combinations with a 
smaller number of different oligotargets, it is possi- 
ble to obtain essential part of all combinations ot 
18. If 36 oligotargets are divided into three groups 
of 12 in each and each group will contain 024 
combinations. After linking the first 924 combina- 
tions in the same number of separate reactions, it 
is necessary to effect equimolar mixing of all DPs 
and separate into 924 tubes with combinations 
from another group of 12 oligotargets. By repeating 
the cycle onco more in 2722 reactions, a mixture 
with 750 millions ot DPs with different combinations 
of 18 oligotargets is obtained. Of course, DP with a 



specified combination of oligotargets means that 
each oligotarget is present in a certain number 
(10--I0 b ) on a given DP, DP naving the same 
combination in a certain number in a final mixture, 

s which depends on the fact with which mass of non- 
mat* jd DPs in the process was started. Thus, 
v-nen less than 10% of combination is usod, a 
sufficient number is obtained for generating, i.e. 
labeling the necessary number of clones fo r SBH 

w of mammalian genomes. For discrimination of the 
same clones, i.e DPs with the same combination, it 
is necessary to carry out 36 hybridizations with 
oligotargets complementary to probes. 

For generating clones through a selection it is 

is ultimate that all DPs in a mixture, carrying the 
same combination of oligotargets. possess in a 
certain number of copies the same oligonucleotide 
of the functional length selected. For 10 million 
clones, taking into consideration that process effi- 

20 ciency will be 10%. it is necessary to have 100 
millions of DPs with different combination and dif- 
ferent functional oligonucleotides. For ISBH that 
number can be within a range of 10 6 -10 1: depend- 
ing on the lengths of DNA which are to be sequen- 
ce ced in one reaction. Besides, with ISBH one has to 
know which oligonucleotide is bound to which 
combi nation on the same DP. which is not the 
case with selective forming ot clo nes. Also, for 
ISBH one has to use a large amount of 

30 oligosequences of the specific length. 

The principle of preparation of these DPs will 
be explained with the example of the synthesis of 
all 15-mers in three cycles. Basically it i* only an 
extension of the procedure for maiking with 

o k oligotarget combinations. In any case, one third (5- 
mer) for all 15-mers. Thj three groups sho.'ld be 
formed, each containing 1024 combinations (this is 
a number of different 5-mers). starting from the 
smaller number of different oligotargets. In this 

40 c-^-e, it is a part of combinations of 6 oligotargets 
from the group of 13 oligotargets. A total number of 
oligotargets is 39. i.e. this is a number of neces- 
sary hybridizations for DP discrimination. In each 
first combination from the group the first 5-mcr is 

45 added (for instance, AAAAA) and so on until 
GGGGG is reached. At this moment, to a combina- 
tion from the first group non-labeled DPs are ad- 
ded, and then oligotargets and given 5-mer are 
linked to DP; DPs are mixed and equimolarily dis- 

so tributed into the combinations of the second group. 
In this step, oligotargets are Iink6d to DPs as 
separate molecules, and 5-mers affording 10-mors. 
In each of 1024 reactions of the second group 
1024 10-mers are synthesized, i.e. all 10-mers are 

55 synthesized. By repeating the same operations in 
the third cycle all 15-mers in 3072 reactions are 
obtained in such a manner that or.e knows exactly 
which 1 5-mer is on DP with which specific com- 
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bination of oligotargets. 5-mers are not necessarily 
added as the finished units, but instead synthesis 
thereof can be executed in the given 3072 reac- 
tions. When a complete procedure is performed in 
5 cycles with 64 reactions, then totally only 320 
independent reactions are required. In this case, it 
is necessary to divide 40 oligotargets in 5 groups 
with 8 combinations in each group with 4 oligotar- 
gets. These examples Illustrate the power of com- 
binative synthesis in which the number of oper- 
ations grows by arithmetical and number of synthe- 
ses by geometrical progression. 

Taking into consideration possibilities of com* 
bined synthesis* bank forming and synthesis in a 
mixture, one can conclude that a number of syn- 
theses and manipulations is not necessarily large 
and, consequently, ON price should be directly 
related to the mass. i.e. cost of the material. The 
use of OPs as a substitute tor dot-blots offers in 
this sense considerable advantages because it re- 
quires a lower amount of ONP. For the same target 
density per area a much smaller amount of target 
per DP is necessaty, than per dot. The total area of 
HA is also much smaller and this decreases the 
necessary amount of hybridization buffer, i.e. ONP 
msss. If DP diameter is 4um, then the area of its 
maximum section is ca. 10 urn 2 . When dot area is 
1 mm 3 , the ratio is 1:10*. Based on the calculations 
that in forming of random mono ..yer 10-fold more 
DPs must be used in order that each one is repre- 
sented at least once in HA. and that in this caso 
utilization of space is only 10%, ratio would be 
1:1000 Speaking in absolute numbers, in case of 
DP area of one HA would amount to 10x10 cm, 
while in dots would be 1x1 m. Assuming that one 
HA can be used for testing 1000 ONPs (mixture of 
10-100 ONPs x 100-100 washings), the total area is 
1 m 2 , vs. 1000 m 2 . In the first case necessary 
amount of each ONP in SBH can be calcutateo in 
the following manner. One QNP has a target in 
each tenth clone ano since ten times more of DP is 
necessary because of random sampling, the total 
number of DPs with which one ONP is hybridized 
equals the number of clones (10 millions). For a 
signal detection on one DP using CCD cameras, it 
is sufficient to perform labeling with less than 1000 
fluorescent molecules (private communication). If 
one supposes that in hybridization only 0.1% of 
ONP Is made use of, then one needs 10 13 ONM 
molecules for hybridization with all clones. Since 1 
ug of 8-mer ONP contains about 3x1 0 1 * molecules, 
then such a mass of individual ONP is more than 
sufficient. The dot system would probably request 
larger mass of the order of 1 mg. The dollar 
savings per one genome (or 100 GENOGRAMS), 
according to Beattee's prices for a library, would 
be about a million US-S in a transition from dot to 
DP system. 



Detection of ONs contents on a level of one DNA 
molecule 

If one restricts himself to consideration of hy- 

5 bridization as a procedure for determination of ONs 
contents, the problem of detection of single target 
molecule has two components. The first is the 
possibility (successfulness, efficiency, probability) 
of occurence of the hybridization event with a sin- 

to gle targe! molecule (there can be an excess o 1 
ONP) and the second is the detection of the hybrid 
obtainod. Since no efficient or simple procedure for 
detection of single molecule hybridization has been 
developed so far, there is no knowledge of this 

ts reaction either. One can assume that the event of 
single molecule hybridization occurs with a certain 
probability (in a defined % of trials). The detection 
of such an event can be of two kinds. In the first, 
the detection of the signal is produced by the 

20 marker on hybridizing probe (e.g. fluorescent mol- 
ecule, enzymatic activity), even if later amplified in 
various ways. In the second kind, the hybridization 
event is amplified itself. Its logic is the same one 
used in all exponential doublings in natural and in 

ts vitro amplification reactions (cell division, DNA rep- 
lication, PCR, ligation-amplification reaction (LAR)). 
The total amount of product is k x n c , where n is 
usually 2 and represents the amplification factor, c 
is the number of cycles and k the efficiency factor. 

30 For GENOGRAM determination one can use LAR, 
since the basic requirement of the method is 
obeyed, and that is the previous knowledge of 
sequence or the small number of its variants. LAR 
is more difficult to apply to SBH of unknown se- 
as quence. ON which is the reporter of hybridization 
event (usually carries biotin) would have to be very 
short in order to use all theoretically possible se- 
quences in a mixture (probably 4-5 mer). In addi- 
tion, LAR has a problem of how to localize ligation 

40 product on dots or DPs on which it is formed. If the 
problem of local fixation is resolved, one can avoid 
the requirements of the specificity of ligation reac- 
tion and reporter molecules. The scenario for sim- 
ple amplification hybridization could look this way: 

45 DPs carrying the capability of binding ONPs 

having a specific chemical group on one end, and 
a single target are prepared. Then one hybridizes 
with an ONP that is both complementary and car* 
ries the chemical group. After discriminative hy- 

so bridization and washir.g, the reaction of denatur- 
ation a»'d binding of ONP to DP is performed. Care 
shoo'd be : :.'<en of that this can be "possible only" 
with ONP which is hybridized to the target on a 
given DP.In this way DP having a positive hy- 

55 bridization would form two targets. The hybridiza- 
tion reaction is repeated in which both the starting 
and complementary ONPs are used (synthesized 
target). In a new cycle of denaturati' >n and binding 
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of hybridized ONPs, the number of targets is dou- 
bled. One repeats these cycles until a detectable 
number of ONPs is bound to OPs. It is interesting 
to note that after the first cycle which should be 
with short ONPs (8-mers for SBH), one can switch 
to hybridization with longer probes and targets 
quite independent from the primary target. The 
discrimination is thus easily achieved in a high 
number of cycles. This is accomplished by the use 
of synthesized targets which are longer than pri- 
mary ONP and by the use of one additional ONP 
complementary to a synthesized target* 

This scheme Is just a theoretical possibility in 
detection of single molecule hybridization and does 
not presume experimental feasibility. In any case, 
the detection of hybridization (or in general) on the 
level of a single molecule being a process of small 
probability of positive outcome can be treated sta- 
tistically as a result of trials on a large number of 
DPs with the same target or a larger number of 
trials with same ONP on the same DP. This larger 
number of trials on the one DP is integrated in 
many cycles of hybridization. In this case the posi- 
tive hybridization is recognized within a wide span 
of signal intensities above a certain threshold. This 
is similar to the situation when in dots or DPs is a 
large difference in the molarity of amplified targets. 



Image analysis 

The possibilty of different labeling of ONPs for use 
as a group 

It has been established that in the informational 
approach of desirable efficiency for determination 
of a genomic sequence, one needs to store In a 
computer memory 10 s to 10 6 bits of binary in- 
formation per second. Since the matrix of informa- 
tional genomic approach consists of 10 7 targets x 
10 5 ONPs, this speed can be theoretically attained 
at the two extremes: by reading submatrix of 1(10) 
targets x all ONPs per second, or 10 s -10* targets x 
1 ONP. Of course, all other more or less square 
submatrices are possible as well. From a practical 
standpoint it is simpler not to prepare those ma- 
trices which are based on the extremes of the one 
or other component. For instance, if only one ma- 
trix with a single ONP is used, one would have to 
perform 100000 separate hybridizations. On the 
other hand, it is difficult to perform simultaneous 
hybridization followed by recognition of 100000 
ONPs. Therefore, parallel formation of many sub- 
matrices of the type 10* to 10 5 targets x 10-100 
ONPs seems to us as most rational way to pro- 
ceed. Thu parallelism can be of the two kinds: a 
formation of all 100-1000 submatrices with the 



same group of ONPs on a single HA, ani simulta- 
neous hybridization in separate vessels on HA 
"replicas" with many different ONP groups. From a 
standpoint of the number of separate hybridiza- 

s tions. it would be more favourable to use groups of 
100 ONPs which would require only 1000 separate 
hybridization reactions and probably as many sep- 
arate ONP syntheses (see section on ONP synthe- 
sis in a mix). On the other hand. 10* -10 s targets 

w require about million pixels in electronic cameras 
by means of which efficient image analysis is 
achieved. CCD cameras can have from 650000 to 
1.3 million pixels of about 10 x 10 am and. there- 
fore, matrix suggested can be looked upon as a 

is single picture. The size of pixels and of DPs are 
not directly dependent on each other because of 
the possibility of using optical microscope. For 
image analysis one can use DPs of 0- 1 um size. 
The speed of image analysis possessed by 

20 present CCD ;ameras is 50000 pixels in a second 
and this is about 20 times slower than the require- 
ment of 1 million pixels oer second. The speed 
indicated includes "photographic recording", digi- 
tization and storing in a computer memory. Prob- 

25 ably technical characteristics are limiting factor 
rather than theoretical limitations. With a more pow- 
erful device fast digitization can bo attained, es- 
pecially when no large digital resolution is needed. 
Kodak announced a camera which reads 1 million 

30 pixels with 3 bits (discrimination of 256 levels of 
signal intensities) in 10 shifts (10 different images) 
per second. On the other hand, it should be point- 
ed out that hybridization image does not have to be 
seen or reconstructed on a display, and this fact 

35 - means a time saving. 

In the most suitable submatrix there is a for- 
midable problem. It is a simultaneous hybridization 
with 10-100 ONPs. and to achieve this one needs 
that many labels recognizable in a mixture. One 

40 should also keep in mind that in SBH mathematical 
expectation is that each probe will hybridize with 
10% of clones (or even less in GENOGRAM). This 
means that use of the mixture of 100 ONPs would 
hybridize approximately 10 ONPs per each dot or 

4s DP. There are two experimentally confirmed ap- 
proaches which can influence solving this problem. 
One is the use of differen\ fluorescent molecules, 
and the other is gas chromatography coupled with 
mass spectroscopy. In the former it is difficult to 

so imagine the use of more than 10 different fluores- 
cent molecules and also for their detection (on a 
single DP) one needs the exciting light of different 
wavelengths or filters. Every change of wavelength 
or filters is an additional physical operation, making 

55 optimization possible only in hybridization and not 
in image analysis. However, extreme precision and 
sensitivity of CCD cameras (down to two photons 
per second) lend to this approach great possibili- 



11 



21 



EP 0 392 546 A2 



22 



ties. The second approach can potentially discrimi- 
nate even 1000 labels, but is suitable (or possible) 
only on single, or a small number of samples. For 
this methodology to work one would need to de- 
velop technology of parallel acquisition of total data 
from 100-1000 samples per second. This is difficult 
to blend with the use of unordered microsamples in 
form of DP. 

The most simple for image analysis is to dis- 
criminate between a great number of objects on 
the basis of their physical characteristics such as 
size, shape and color. These characteristics are 
reduced to different photon patterns in contrast to 
defined photons emitted by fluorescent molecules. 
This permits the recognition of an -unlimited" num- 
ber of non-overlapping objects. Therefore, the rec- 
ognition of hundreds of physically different DPs in 
image analysis must be simple. One can ask the 
question whether this principle of target labeling 
can be used for ONP recognition (labeling) as well. 
Two principles can be applied: 

1) ONP carries a physical entity recogniz- 
able by optical microscope and thus usable in 
image analysis, and 

2) ONP carries a chemical entity which can 
be used after hybridization as an initiator for lo- 
calized formation of a specific physical entity. 

The most simple application of the first princi- 
ple is to double DP system in which ONPs are 
bound to DP which can be mutually discriminated 
according to physical characteristics. Positive hy- 
bridization would lead to rosette formation. The 
target DP would be surrounded with DPs carrying 
ONPs whose complementary sequences are 
present in a given target. This principle of the 
visualization of bimofecular recognition reaction is 
employed in immunology where positive antibody- 
Saigon reaction is recognized by the rosettes 
formed by the appropriate cells. The basic difficulty 
is whether ONP hybridization can lead to the for- 
mation of a sufficient number of chemical bonds 
the energies of which are strong enough to hold 
the two DPs together. One should not forget that 
the applied system must allow simultaneous dis- 
criminative hybridization on the level of one base 
pair mismatch. 

The second approach does not require linking 
of DPs by chemcial bonds formed in hybridization. 
Its problem is the way how an initiating event on 
ONP can be transformed into a locally recognizable 
physical character. One should probably take ad- 
vantage of a local concentration of a certain 
reagent (for instance, a certain metal ion). One can 
rephrase the question in the following way: how 
can one transform in one or several reactions 10 
different ions, distributed locally, into the localized, 
physically different entities. One can speculate on 
the ionic initiation of chemical interactions on the 



surface of target DPs with specific DPs added in 
the system. The other possibility is the initiation of 
local forming of specific, recognizable micro- 
crystals. 

5 

Advantages of the solution described 

These solutions represent an attempt to define 

to a more rational approach to the development of the 
methods necessary in a resolution of central prob- 
lems of molecular biology such as determination of 
genome primary structure, determination of regula- 
tion and self-regulation of biological systems, treat- 

15 n^nt of cancer and others. Theoretics treatment of 
the problem and the comparison ot the properties 
of all theoretically possible approaches has a goal 
to discard some methods and procedures as ineffi- 
cient, or impossible, and to initiate and encourage 

20 the development of the procedures which are more 
efficient. Thus, one can pose the question why the 
PGR reaction has not been "discover »d* earlier 
when all its material components were known. It is 
tempting to assume in the retrospect that a thor- 

25 ough theoretical analysis of the ways of the am- 
plification of single DNA fragments or fragment 
libraries could have predicted PCR with its now 
plainly obvious advantage. Would the existence of 
such theoretical concept have led to earlier appiica- 

30 tion of PCR? Due to its complexity and size, the 
human genome project inevitably needs theoretical 
treatment of methodological requirements and the 
ways they can be satisfied. Empirical discovery of 
more officient approaches is not possible, because 

35 genome cannot be subjected to minor experiment. 

The INFORMATIONAL PP'NCIPLE defined 
here is based on the use of ONs words as in a 
case with the efficient algorithms for sequence 
comparisons. For the moment, for determination of 

40 ONs content there are two (with experimental con- 
firmation) natural molecular processes: recognition 
on the basis of complementarity of NA and specific 
recognition used for some proteins. The first pro- 
cess is more general because any sequence can 

45 be recognized and probably is easier due to NA 
stability. In light of this, we believe that ONP hy- 
bridization will have a central role in compilation of 
genomic sequences. The basic informational char- 
acteristic of ONP hybridization is a determination of 

50 ONP contents and this is probably the best way for 
recognition of ONs. 

The fundamental principle allow the technologi- 
cal advantages of unbroken parallelism (the sam- 
ples travel together from genomic DNA to IMAGE 

55 ANALYSIS) and amplification cascades (100 HA x 
10 genomic parts x 10 5 "* DP) x 10 washes x 100 
ONPs/hybridization x 10 days = 10 13 unit informa- 
tion data). This can be called a parallelism cf 
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parallel processes, i.e. quadratic parallelism. A 
small number of operations is perfc* ned in each 
step, but due to multiplication and not summation 
of *; *r)' ot unit information bits per step, the total 
yield of information bits is enormously large. One 
can also make use of the technological advantage 
secured by maximization of the use of resources 
and materials prepared beforehand that are iden- 
tical tor different samples and the minimization of 
sample specific treatment" This functional cassette 
preparations are independent (and usable for any) 
of the individual object of a species given, i.e. 
individual genome. The consequence is preparation 
of ONPs containing integrated information of type - 
oasic element x position, usable for p.ny DNA. This 
also results in DP preparation integrating the ad- 
dressing information, so that it needs not be deter- 
mined by robotic petitioning operations. The com- 
puter software package for generation of seuqence 
whirh was once made, is reusable on new data 
sets without the intervention ol tr. . scientist-experi- 
mentalist. 

In the final analysis, the INFORMATIONAL AP- 
PROACH provides wc decrease of experimental 
requirements at the fjxpen.su of the informational 
computer woik Based on the several described 
conceptual solutions and possible, or eventually 
possible, pt actual procedures, one can estimate a 
potential for decreasing experimental requirements. 
Thus, the experimental surlace area and the ONP 
mass are decreased by about 1000 times. The 
corresponding decrease can be expected and for 
the necessary mass ol the samples which would 
be probably required in a classical sequencing and 
in a dot-blot system with the vessels for clone 
cultivation which are at least 100 urn each per 
clone The total volume for 100 million samples 
would be of the order of several tons, and in 
invented micro-hybridiazation several liters. Even 
more important than space and material savings 
are reductions in the number of robotic operations. 
A robotic hand with 10000 pipetting fingers needs 
1000 operatio to make one filter with 10 million 
dots. Using the DP system, a robotic hand with 
several pipetting units (10-1000) can perform an 
analogous task in a single operation. All this can 
l«ad to a miniaturization ol "genomic installations** 
onto a size of the bigger laboratory instruments of 
nowadays. 

The DP system in essence represents the imi- 
tation of multitude of biochemical reactions oc- 
curmg simultaneously within a single cell. Specific- 
ity and discreteness of cellular reactions are based 
on enzyme actions whose informational properties 
arc imitated here hy DPs. The use of DPs requires 
an at least 10-fold increase in the numbei of unit 
information bits, but time and labour investments 
(preparatory, and robotic operations) tor obtaining 



the complete data set are reduced several times. 
The several examples which follow are intended to 
show how one can transfer the center ol operations 
to IMAGE ANALYSIS and thus, to make the most 

5 efficient step. In a robotized dot system, every 
DNA fragment is represented in each HA only 
once. In DP system this certainty is replaceu with 
the probability that each clone is represented at 
least once in HA. This imposes the 10-fold increase 

jo in a total number of DPs. On the other hand, this 
and even biggei increase allows the tolerance of 
imp- -fee? * /bridization on individual DP. i.e. statis- 
tical determination of positive hybridization. This 
means n the last instai.-e. the reduction of re- 

5 quired experimental performance levels. Therefore, 
the DP based hybridization and signal reading pro- 
cedures must tolerate the libraries consisting ot a 
larger number of fragments which, in turn, allows 
the use of smaller number of shorter ONPs. This is 

20 especially evident in GENOGRAM application. In- 
stead of specifically choosing of 10000 pairs of 
primers and ONPs. it is more efficient to perform 
hybridization of all amplified fragments with all 
ONPs. The advantage is further strengthened by 

?s the realization that the ensuing surplus of informa- 
tion means the higher accuracy in polymorphism 
determination as well as the possibility of detection 
of new mutations, both of which can be of consid- 
erable diagnostic value. By switching the emphas 1 

30 to the IMAGE ANALYSIS or. in other words, by 
decreasing a volume of experimental work, it is 
possible to obtain large numbers ot detailed 
GENCGRAMS according tc the same principle as 
in determination of the entire genomic sequence, 

35 The use of these characteristics of INFORMA- 

TIONAL APPROACH provides for as a final result, 
besides rr iniaturiza'ion.. a greater speed in com- 
parison wth the processes requiring experimental 
gathering of variou: and more complex data bits. 

40 It is inteiesting to attempt to outline in a com- 

parative analysis the advantages and disadvan- 
tages Of .NFORMATIONAL APPROACH (or 
genomic sequencing versus the three procedures 
which use EXPERIMENTAL APPROACH consisting 

45 of position determining methods. The standard 
method usod up to now, wh:"h is based on the 
finding of t.ic position hy mf iuring the longth of 
DNA fragments has two requirements which are 
almost certainly excluding it as a method of choice. 

so These are the practical impossibility of miniaturiza- 
tion and need fcr use of amplified fragments c 
genomic DNA. The othei twu methods, which likt 
SBH or otner procedures using the INFORMA- 
TIONAL APPROACH have not been experimentally 
verified so far. do not impose these requirements. 
The tunneling electron microscopy, used as tools 
for direct reading, is an inherently miniaturized 
procodure which does not requiro amplification of a 
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DNA fragment. On the other hand, the sequential 
removal of base by base from one end of DNA 
fragment, followed by continual separation by flow 
and efficient registration in passage by the detec- 
tor, for practical reasons almost certainly requires 
the use and detection on a >evel of one molecule. 
Probably, it is very difficult, or may be even impos- 
sible, to synchronize removal of the same 
nucleotide on the level of moles of the fragments. It 
can be imagined that this approach can be min- 
iaturized and made parallel, since instead of ad- 
dressed reactions, multiple microtubes can be 
used ensuring discreteness. In addition, separation 
of removed nucleotides by a water flow does not 
require macro-separation, as this is the case in 
separation of DNA fragments, with an accuracy to 
the level of one base acrylamide gel. The main 
requirement of this approach is precision and 
speed of detection of single events, espocially par- 
allel detection in large numbers of microtubes. The 
question is. can the the use of lasers and fluores- 
cence labeling combined with pixel based Image 
analysis allow the acceptable data acquisition 
speed with non-prohibitively complex equipment. 

In any case, both procedures are relying on 
achievements in physics, while INFORMATIONAL 
APPROACH is exclusively based on biochemical, 
molecular processes. That is so because in SBH, 
as indicated here, one can arrive at the experimen- 
tal image and IMAGE ANALYSIS with minimal 
technical requirements. Since there is an indirect 
detection of molecular reactions. SBH does not 
have to have atoms and since it does not use 
position information. SBH does not require a«v 
physical ordering of reaction allowing the use of 
amplified fragments. 

All methods have a common last step including 
image analysis of "expt Omental image". The ques- 
tion is what is the ease of arriving at thr analysis. 
It appears to us that SBH is more a iapted. mor* 
efficient for sequet. : n y a lane number c: complex 
genomes. Due to itr requir ients for p tparation 
of ONPs and DPs t!SH is valuable. an< perhaps 
more efficient for sec fencing on a lari/s scale in 
comparison with other methods. The \ eduction nt 
individual genomes on a c ^mon denominator - 
CNS. allows the use of inf.vmatiori.il work after 
image analysis for sequent > ^ deration. The entire 
work in non-informational a, ». >ach is of the experi- 
mental character. 



Claims 

1. Process for determination of partial w entire 
nucleic acid sequence by the hybridization ol the 
samples in a mixture, characterized in tha* multi- 
plied or synthesized or separated DNA ^r RNA 



molecules in separate reactions are bound to dis- 
crete particles (DPs) of microscopic size which can 
be discriminated according to physical and chemi- 
cal characteristics thereof. DPs are mixed, then 

s they are hybridized with an individual or with a 
group of probes which are natural or multiplied or 
separated DNA or RNA molecules, and the result 
of hybridization on the individual samples is de- 
tected either by an ordered flow of DPs one by one 

io as they are passing by a detector, or by forming a 
monolayer spread of DPs which permits detection 
by image analysis. 

2. Process according to Claim 1, character- 
ized in that natural or multiplied or synthesized or 

r5 separated DNA or RNA molecules are linked in 
separate reactions to discrete particles (DPs) which 
can be. but not necessarily, discriminated accord- 
ing to physical and biochemical characteristics 
thereof. DPs are mixed, the mixture spread into 

20 one big or several smaller separated areas on a 
solid support, after which DPs are fixed to the 
support. 

3. Process according to any of tho preceding 
claims, characterized in that from the same num- 

25 ber of preparations of different oligonucleotides of 
the known formula different mixtures are prepared 
using combinations of a certain number of starting 
oligonucleotides, each mixture is bound in separate 
reactions to DPs, and DPs carrying the same com- 

30 bination of oligonucleotides are recognized through 
a hybridization with oligo probes which are com- 
plementary to starting oligonucleotides. 

4. Process according to any of the preceding 
claims, characterized in that in a small number of 

a* tho tactions DP mixtures carrying specific com- 
binations of oligonucleotide targets, which are used 
for recognition of DPs. are made, oligonucleotide 
targets are divided into a specific number of 
groups and combinations from each youp are 

40 formed and placed in the separate tubes with a 
specific number of different oligonucleotide targets 
from the given group, then either identical OPs are 
added in thp tubes with combinations or an 
equimolar rat* on of DPs can be discriminated in 

45 each tube according to physical characteristics 
thereof is added, binding of oligonucleotide targets 
to DP is carried out. DPs from all reactions are 
mixed either equimo'arily or in specific cases in the 
specific ratio, then they are divided into the tubes 

so viti combinations of oligotargets from another 
yroup, and the y* of mixing and dividing 

DPs and binding : t - combinations of oligotar- 
gets to DPs is t poated as many times as there 
are the number of groups of oligotargets. 

55 5. Process according to Claims 3 and 4. char- 

acterized in that DP bosides certain combinations 
of the oligonucleotide targets or some other maiVer 
contains functional oligonucleotide of defited 
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length of the functional oligonucleotide which is 
common for alt of the DPs in a given reaction is 
synthesized, before, during or after the binding of 
the given oligotarget combination to DP or other 
marker, in each cycle in a given tube with the 
specific oligotarget combination in that way that in 
a successive reaction of the binding of the oligotar- 
get, combiantions, or other markers, continues the 
syntheses of a needed part of the functional 
oligonucleotide on the part synthesized in the pre- 
vious cycle entire given part which is synthesized 
in the independent process binds to the part syn- 
thesized, or bound in the previous cycle, that is, 
binds to the DP in the first cycle. 

6. Process according to Claims 1, 2 and 5, 
characterized in that the hybridization surface 
(area) consists of solid support with a f : xed mon- 
olayer spread of OPs with characteristics made In a 
manner that in DPs is represented the informative 
part of or all possible dinerent oligonucleotides of 
certain tength/s, the position is determined of DPs 
with each and every functional nucleotide by hy- 
bridization with oligotargets which are used for for- 
ming of the combinations on the DPs. or in some 
other way, if the DPs are not labeled with the 
oligotarget combinations. 

7. Process according to Claims 1 and 6, char- 
acterized in that a sufficient mass of the given 
nuclear acid sample whose total complexity is not 
too high for the functional oligonucleotide length in 
a given hybridization surface (area) is cut in a 
random process in very short fragments, although 
longer than the functional oligonucleotides, gen- 
erated fragments are then labeled, discriminative 
hybridization with the hybridizationat surface (area) 
with the characteristics is performed, in the process 
of microscopic image analysis of the given 
hybridizationat surface (area) is determined on 
which DP the positive hybridization did take place, 
obtained information, on the basis of the informa- 
tion from the given surface (area) on the position of 
the DP w"\ i: given functional oligonucleotide, is 
translated into content of the oligonucleotide se- 
quences in the given nucleic acid sample and 
finally by computer analysis pcrtial or total nucleic 
acid sequence in the given sample is obtained. 

8. Process according to Claims 6 and 7, char- 
acterized in that for the determination and tracking 
of the heredity of the large number of the genomic 
or gene polymorphic sequences for identification of 
the person, extermination of the relatedness or 
evolutionary distance, detection of the changes on 
the genome and genes, prenatal and postnatal pre- 
diction of the phenotype characteristics, determina- 
tion of th9 biological function of the individual 
genes or gene complexes by determination of the 
sequence, a certain fragment, or total human DNA 
of the person, or individual cell, or mRNA, or cDNA 



10 



is 



20 



25 



30 



35 



40 



45 



50 



55 



15 



of thei certaun tissue or group of the tissues, is 
processed and hybridized with hybridization sur- 
faces jthat are containing sufficient number of dif- 
ferent [functional oligonucleotides, so that in the 
largestj:number of cases have complementary se- 
quences on a single point in a given nucleic acids 
sampled and from pattern differences between sam- 
ples from individuals, the poylmorphic sequences 
are determined, whose haplotype combinations in a 
new sample could be determined by applying the 
same {procedure. 

9.fProces$ accordsing to Claim 1, character- 
ized irr that oligonucletide probes in a certain num- 
ber ojfmoles are bound to visually recognizable 
discrete particles (DP) that are different from prom 
to pribe so that one can apply a mixture of 
oligonucleotide probes so that one can apply a 
mlxtuija of oligonucleotide probes, and hybridization 
event [after suitable hybridization washing is recog- 
nized [as a rosotte of discrete particles containing 
corresponding probe bound to discrete particle or a 
point on the solid support where a target containing 
complementary sequence is placed. 

10| Process according to Claim 1, character- 
ized in that for identification of the uncloned genes 
or gene families and parallel investigation of the 
place t |time and modulation of the total gene ex- 
pression by means of determination of the se- 
quence of the mRNA or cDNA prepared from a 
certaint tissue, a certain cell cultures, a certain 
tissuelfat a certain stage of the ontogenic develop- 
ment, or cell cultures or tissues after the influence 
of certain environmental agents, are hybridized with 
the sufficien number of a single or groups of the 
oligonucleotide probes of a length of a 4 to 12 
basest and on the basis of the detected 
oligonucleotide sequence content, the pattern 
{profile, stage) of the expression or relatedness of 
the genes is determined. 

11. Process according to Claim 11. charc- 
terized in that for applications for determination of 
the partial or the total sequence, the genomic DNA. 
mRNA or cDNA library of the given biological sam- 
ple (DNA, mRNA or cDNA) are bound to the DPs 
and jhybridized with the part, or all of the 
oligonucleotide probes of the length of the 4 to 20 
bases, and on the basis of the detected content of 
the oligonucleotide sequences by means of the 
computer processing, a partial or total sequence of 
the individual clones is obtained, and thereby the 
sequence of the given nucleic acid sample. 

12. Process of forming of the library of the 
discrete particles (DPs) for determination of the 
partial or the total nucleic acid sequence by hy- 
bridization of the samples in a mixture according to 
Claims 10 and 11, characterized in that DPr *re 
carrying a single molecule or a certain molarity of 
the siame or mostly overlapped fragments of the 
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genomic desoxyribonucleic acid (DNA) or ribunj- 
cleic acid (RNA), so that the library contains a 
certain number of discrete particles with the same 
nucleic acid molecules, and as the whole the suffi- 
cient number with different molecules so that the 
contents of the sequences in a starting biological 
samples are represented, by using the mixture of 
the DPs containing functional oligonucleotides 
whose sequences appear only once or are non- 
existent in a starting nucleic acid sample in which 
every sequence s represented in a large number 
of moles, by means of hybridization process, a 
sorting of the nucleic acid fragments is performed, 
as well as their fixation to the DPs afterwards, so 
that all of the DPs with the same functional 
oligonucleotides conatin fragments with the same, 
mostly overlapped or very similar sequences from 
libraries. 

13. Process according to Claim 12. character- 
ized in that to the genomic fragments of appro- 
priate sizes are enzymatically bound on boih ends 
the same or different short fragment of the desox- 
yribonucleic acid (DNA), followed by sufficient dilu- 
tion of the fragments, so that the forming is permit- 
ted of the separate * am pies containing a single, or 
no molecules, and in vitro enzymatic amplification 
is performed by polymerase chain reaction using 
primers that are complementary to the ligated DNA 
fragments, or by using of ribonucleic acid (RNA) 
polymerases, if the ligated fragments are promotor 
sequences. 

14. Process according to Claim 12, character- 
ized in that the library formation of the discrete 
particles (DP) is performed by uoing amplification 
reactions in hich conglomerates of the DPs by 
random process are enclosed with the certain 
amount of the amplification mixture with one or 
none genomic desoxyribonucleic acid (DNA) frag- 
ments, or complementary DNA or ribonucleic acid 
(RNA) into membranes impermeable for the macro- 
molecules, followed by an amplification. DNA is 
fixed to the DPs, followed by disruption of the 
membranes and conglomerates resulting in the in- 
dividual DPs mixture in which majority of the DPs 
contain a large number of copies of the same 
fragment of the genomic DNA, complementary 
DNA, or RNA. 

15. Discrete particles according to Claims 1 to 
14, characterized in that they possess the same 
or different physical or chemical characteristics and 
are containing combinations of the different 
oligonucleotides of the known formulas that are 
represented either as an individual molecule of 
each, or the cenain molarity of each, so that given 
oligonuc ">tide combination serves for difcrete 
particles recognition by hybridization, or in any 
other obvious way. 

16. Discrete particle (DP) conglomerates ae- 
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cording to Claim 14, characterized in that DPs in 
conglomerate have the same or similar physical or 
chemical characteristics and are bouna together by 
weak [ physical or chemical bonds, thus enabling 
easy disassembling to individual DPs. the discrete 
particles between different conglomerates are rec- 
ognized by size, shape, colour, chemical prop- 
erties!' or by oligonucleotide combinations or in any 
other [obvious way. 

17. The mixture of the discrete particles (DPs) 
according to Claims 3 and 6. characterized in that 
every^ DP in the mixture contains one functional 
oligonucleotide as a single molecule, or in a certain 
molarity, and DP is represented in the mixture 
once | or several times. DPs with the same 
oligonucleotide possess the same physical or 
chemical characteristics, and discrete particles 
containing a different oligonucleotide can. by phys- 
ical o| chemical characteristics, be identical or dif- 
ferent! by size, shape and colour, or can contain 
different oligonucleotide combination, or in any oth- 
er obvious way. 

18 Genomic DNA fragments, cDNA. cRNA 
molecules and i.^eir sequences, characterized in 
!ht-i they are identified, or isolated, or that their 
sequence is determined, by using processes ac- 
cording to Claims 1 to 14. 
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